Method for determining competing risks

ABSTRACT

The invention concerns a method for determining competing risks for objects following an initial event based on previously measured or otherwise objectifiable training data patterns, in which several signals obtained from a learning capable system are combined in an objective function in such a way that said learning capable system is rendered capable of detecting or forecasting the underlying probabilities of each of the said competing risks.

FIELD OF THE INVENTION

The invention is directed to a method for determination of competingrisks following an initial event using a learning-capable system on thebasis of previously measured or otherwise objectifiable data (“trainingdata”).

BACKGROUND OF THE INVENTION

Learning-capable systems such as neural nets are being used increasinglyfor risk assessment, because they are capable of recognizing andrepresenting complex relationships between measured factors and outcomesthat are not known a priori. This capability allows them to provide morereliable and/or more precise risk probability estimates thanconventional procedures that are forced to assume a special form of therelationship, such as linear dependence.

In the field of medical applications, e.g., in treatment of cancer, theuse of learning-capable systems such as neural nets or recursivepartitioning (such as the well-known CART, “Classification andRegression Trees”, see for example: L Breiman et al., “Classificationand Regression Trees”, Chapman and Hall, New York (1984)) for assessmentof the risk probability of an event is known, even for censored data.(Outcome data is known as “censored” if some events that eventuallyoccur are not necessarily observed due to the finite observation time.)An example of the application of learning-capable systems in cancer isthe task of determining, at a point in time just after primary therapy,a patient's risk probability (say, risk of future disease (relapse)), inorder to support the therapy decision.

The “factors” of the data sets comprise a set of objectivecharacteristics whose values are not influenced by the person operatingthe learning capable system. In the case of primary breast cancer, thesecharacteristics may typically comprise

-   -   Patient age at time of surgery    -   Number of affected lymph nodes    -   Laboratory measurement of the factor uPA    -   Laboratory measurement of the factor PAI-1    -   Characteristic of tumor size,    -   Laboratory measurement of the estrogen receptor,    -   Laboratory measurement of the progesterone receptor.

The form of therapy actually administered can also be coded as a factorin order that the system also recognize relationships between therapy,and outcome.

The values are stored on an appropriate storage medium and are presentedto the learning capable system. However, as a rule, individualmeasurements are subject to uncertainty analogous to the noise in ameasured signal. The task of the learning capable system is to processthese noisy values into refined signals which provide, within theframework of an appropriate probability representation, risk assessment.

The learning capability of networks even for nonlinear relationships isa consequence of their architecture and functionality. For example, aso-called “multilayer perceptron” (abbreviated “MLP” in the literature)comprises one input layer, one hidden layer, and one output layer. The“hidden nodes” present in a neural net serve the purpose of generatingsignals for the probability of complex internal processes. Hence, theyhave the potential to represent and reveal for example underlyingaspects of biological processes that are not directly observable, butwhich nonetheless are ultimately critical for the future course of adisease.

Internal biological processes can proceed in parallel, at differentrates, and can also interact Learning capable systems are capable ofrecognizing and representing even such internal processes that are notdirectly observable; in such cases, the quality of this recognitionmanifests itself indirectly, after learning has taken place, by virtueof the quality of the prediction of the events actually observed.

By recursive partitioning (e.g., CART), classification schemes arecreated that are analogous to the capabilities of neural nets in theirrepresentation of complex internal relationships.

The course of a disease may lead to distinct critical events whoseprevention might require different therapy approaches. In the case offirst relapse in breast cancer, for example, it is possible to classifyfindings uniquely into the following mutually exclusive categories

-   1. “distant metastasis in bone tissue”-   2. “distant metastasis but no findings in bone”-   3. “loco-regional” relapse.

Now, once one of these events has occurred, the subsequent course of thedisease, in particular the probability of the remaining categories, canbe affected; hence, in a statistical treatment of such data it is oftenadvisable to investigate just first relapses. For illustration, in thecase of a breast cancer patient suffering local relapse at 24 monthsafter primary surgery and observed with “bone metastasis” at 48 months,only category 3 is relevant if one restricts to first relapse. Thefollow-up information on bone metastasis would not be used in thisframework, i.e., the patient is regarded as “censored” for category 1 assoon as an event in another “competing” category (here local relapse)has occurred.

Competing risks can also occur for example due to a patient's dying ofan entirely different disease or of a side-effect of therapy so that therisk category of interest to the physician is not observed.

For one skilled in the art, it is relatively obvious that by applying anexclusive endpoint classification with a censoring rule for unrealizedendpoints, the data can be projected onto a form such that for eachpossible endpoint, according to the prior art, a separate neural net canbe trained or a classification tree can be constructed by recursivepartitioning. In the example with outputs 1-3, three completelyindependent neural networks or three independent decision trees wouldneed to be trained.

A problem with this use of the prior art is that detection of possiblepredictive value of internal nodes with respect to one of the diseaseoutcomes is lost with respect to the remaining disease outcomes. Inreality, however, an internal biological process, detected by internalnodes of a neural network, could contribute to several differentoutcomes, albeit with different weightings. For example, the biological“invasiveness” of a tumor has a differing but significant impact both ondistant metastasis and local relapse. The separately trained nets wouldeach need to “discover” independently the impact of an Internalrelationship coded in a node.

It is evident that the number of real events presented to a learningcapable system is an important determinant of the detection quality,analogously to the statistical power of a system. This number is usuallylimited in medical applications. Hence, the probability is relativelyhigh that an internal process will barely exceed the detection thresholdwith respect to one outcome but not with respect to the others. Underthese circumstances, the potential impact to distinguish factorinfluences, as well as the biological explanatory potential of aninternal node even for other outcomes, are lost.

Since therapies often have side effects it is typical for the medicaldecision context that the reduction of one risk category may occur atthe expense of an increase of another risk. For this, the need to traina completely new neural net for each separate risk, as required by theprior art, is unsatisfactory.

The time-varying impact of factors on outcomes can be representedaccording to the prior art by different nodes in the output layercorresponding to particular time-dependent functions (e.g., by the knownmethod of fractional polynomials). Although a time-varying assessment ofthe hazard rate is possible according to the prior art, the problem ofcompeting risks cannot be formulated according to the prior art withoutinterfering with a proper assessment of time-varying hazards.

In view of the deficiencies of the prior art, the task of the inventionis to provide a method for detecting, identifying, and representingcompeting risks according to their intrinsic logical and/or causalrelationship, in particular in such a manner that determination of atime-varying assessment is not restricted.

DESCRIPTION OF THE INVENTION

This task is solved by the method according to patent claim 1.

The invention provides a method for the learning capable system toassign appropriate distinct characteristic scores to competing risks.These scores are designed to enable the estimation of the conditionalprobability per unit time for occurrence of the event category inquestion (under the premise that none of the final outcomes underconsideration has yet occurred). In the sense of the invention,“appropriate” characteristic scores have the property that a maximum ofthe statistical likelihood is sought with respect to all outputs.

It is evident that the method of the invention applies to a broadspectrum of fields, such as engineering, economics, finance, biology, ormedicine. In the case of medicine, the objects may refer to patientswho, following primary disease the initial event, are at risk forcompeting forms of disease relapse.

It is advantageous to utilize measurements or other objectively compileddata associated with the initial event together with follow-upobservations recorded up to a specified time.

It is of advantage if the time of the most recent follow-up observationis recorded and used in the training data patterns.

The method of the invention can thus be applied within the framework ofany trained learning capable system to any objective function analogousto statistical likelihood, provided that sat function can be constructedfrom the follow-up data.

In an advantageous embodiment of the invention, failure categories aredefined such that observation of one failure category implies exclusionof the other categories at time of observation. In this way, theembodiment provides a means preferentially assessing one particularfailure category.

It is advantageous to specify the objective function L in terms of afunction P of the form:

${{L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}{P\left( {{{{\left\lbrack {f_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack;}\mspace{11mu}\left\lbrack {S_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack};{k = 1}},\ldots\mspace{11mu},K} \right)}}},$

Here, the notation μ denotes collectively the parameters of the learningcapable system. (“LS” stands for “learning capable system”.) Thenotation f_(LS(k,xj))(t_(j)) denotes the “failure” rate of category k,and S_(LS(k,xj))(t_(j)) denotes the expectation value of the fraction ofobjects j with observed characteristics x_(j), that have not suffered afailure of category k by time t_(j). P is determined by an appropriatelogical relationship model, where the follow-up data is cod d in theform δ_(jk), where δ_(jk)=1, if object j is observed at time t_(j) tosuffer a failure of category k, else δ_(jk)=0.

It is advantageous to define the objective function in the form

${L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}{\prod\limits_{k = 1}^{K}{\left\lbrack {f_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack^{ɛ_{jk}}\mspace{11mu}\left\lbrack {S_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack}^{\psi_{jk}}}}$where ε_(jk) and ψ_(jk) are uniquely determined using the logicalrelationships from δ_(jk).

It is advantageous to use

${L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}{\prod\limits_{k = 1}^{K}{\left\lbrack {f_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack^{\delta_{jk}}\mspace{11mu}\left\lbrack {S_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack}^{1 - \delta_{jk}}}}$as the objective function.

In a preferred embodiment, the learning capable system consists of aneural net. In this case, depending on P, the aforementioned objectivefunction L may be expressed in the form

${L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}{{P\left( {{{{\left\lbrack {f_{{NN}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack;}\mspace{11mu}\left\lbrack {S_{{NN}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack};{k = 1}},\ldots\;,K} \right)}.}}$

It is advantageous to use a neural network of architecture MLP(multi-layer perceptron).

In another preferred embodiment, the learning capable system carries outrecursive partitioning, where

-   -   each object is assigned to a node,    -   to each node there is assigned the frequency or probability of        all outcome categories, and    -   the partitioning is carried out such that the objective function        to be optimized takes these frequencies or probabilities into        account according to an appropriate statistical model.

In a preferred application, the learning capable system is used in theframework of decision support.

It is advantageous to assign values pertaining to selection of astrategy to the distinct probability functions of the competing risks.In this way, for example in the case of a medical application of thepresent invention, a therapy strategy may be assessed.

BRIEF DESCRIPTION OF THE DRAWINGS

In what follows, the method of the invention for determining competingrisks will be further described with reference to the figures asfollows:

FIG. 1 A representation of a neural network in an implementation as amulti-layer perceptron,

FIG. 2 a Venn diagram of competing risks, and

FIG. 3 an illustration of a trained neural network with three competingrisks.

DESCRIPTION OF THE INVENTION

Although the embodiments described, in what follows refer to medicalapplications this reference is not to be construed as a limitation ofany kind.

The following description utilizes the terminology of neural nets ofarchitecture MLP. However, the application using other neural netarchitectures or regression trees is analogous and would be clearwithout further description to one skilled in the art.

In particular, the invention provides for introduction an additionaldimension of the output layer of the leaning capable system, where

-   -   the additional dimension of the output layer comprises at least        two nodes    -   the nodes of this additional dimension correspond to the        different outcome events    -   every output node is associated with a unique signal,    -   the individual signals are each mapped to a risk function with        respect to the possible event categories,    -   the signals of the output functions are combined to a total        signal    -   the learning capable system is trained with reference to an        objective function obtained from the total signal constructed        from the set of all data exemplars

A system trained in this manner supports the responsible physician andthe patient for example in deciding to use one of several alternative ormutually exclusive therapy approaches by determining against which ofthe possible relapse categories therapy should be directed.

Representation of the Problem and Overview

The aim of individualized patient prognosis with competing risks may beformulated mathematically as the problem of approximating At pluralityof functions f₁(x)f₂(x)f₃(x), by means of a learning capable system, forexample, a neural net NN₁(x), NN₂(x), . . . . More precisely, the neuralnet estimates the expectation value E(Y_(k)|x) of the stochasticvariables y_(k)conditioned on observed characteristics x:NN _(k)(X)≈ƒf_(k)(x)=E(y _(k) |x).

In a specific embodiment of the invention as a multilayer perceptronconsidered for the moment, the neural net can be representedschematically as illustrated in FIG. 1.

In this figure, all squares represent neurons. The neurons depicted inthe upper part of the figure provide signals consisting of either

-   -   raw patient characteristics (e.g., in primary breast cancer,        uPA, PAI-1, number of affected lymph nodes, etc.) or    -   quantities obtained by mathematically transforming these        characteristics in some way (e.g., adjusted values obtained by        subtracting out the mean or median of the distribution and        normalizing by the standard deviation of the distribution) or    -   derived quantities obtained using prior knowledge or other        statistical methods.        Together, these neurons constitute the input layer.

The middle neurons form the internal layer. However, it is also possiblein the method of the invention to specify several internal layers. Eachinternal neuron processes the signals from the neurons that act asinputs to it and transmits a signal to the next layer. The mathematicalrelationship between “inputs” to the internal neurons and their“outputs” is controlled by convergence of synaptic weights.

The neurons depicted at the bottom give estimates of the desiredcharacteristic quantities of the model (e.g., expectation value ofsurvival) and constitute the output layer.

Suppose that a number m of patients is available to allow the network tolearn the relationships f₁(x) f₂(x) f₃(x), . . . that have been assumedto exist. To each patient, a data pattern (x,y) is assigned, where forcompeting risks the output variables y are understood to representvectors (y=[y₁y₂,y₃, . . . ]) possibly containing more than onecomponent. The task of the net is thus to learn the underlying dynamicsusing the set of data patterns {(x¹,y¹), . . . (x^(m), y^(m))}. Thesuperscript refers to the patient index. In the learning process, afitting of the synaptic weights takes place.

The architecture used in the embodiment consists of a classicalmulti-layer feed-forward net. neurons are organized in layers asdescribed above. Connectors exist in the embodiment as follows:

-   -   input layer→hidden layer    -   input layer→output layer    -   hidden layer→output layer

The use of connectors from input layer→output layer is favorable, butnot obligatory for the function of the invention, because they are notnecessarily required for representation of a mapping NN(x).

Operation of Neural Nets

Neurons as Functions

Each neuron receives a stimulus signal S processes this according to apre-specified activation function F(S) and outputs a correspondingresponse signal A=F(S), which is transmitted to all subsequent neuronsthat are still connected to said neuron. In the embodiment, theactivation function of the hidden layer is the hyperbolic tangent. Theinvention can be operated as well using any other suitable activationfunction such as the logistic function.

Transformations and Input Neurons

It is favorable to apply an initial univariate transformation to thefactors such that their values lie within an interval of order unity,e.g. in the, embodimentX _(j)=tan h[(x _(j) −x _(Median))/x _(Q)]  (1.a)is used. This formula implies that first the median x_(Median) of thedistribution of x is subtracted, and the values are rescaled by thefactor x_(Q). Values above the median are scaled by the 75%-quartile,while values below the median are scaled by the 25%-quartile. Thefunction tanh is then applied to the result.

The input neurons have a static function and are thus implemented in theembodiment as arrays for transmitting the transformed values to the nextlayer. Conceptually, the hyperbolic tangent function of Equation 1.a canbe regarded as the activation function of the input layer.

Hidden Neurons

The output of hidden node h for patient j is to be determined. To thisend, in the embodiment a check is performed as to whether or not thehidden mode h is still active. If it is active, then the input signalsare multiplied by the corresponding weights to construct the sumW_(h).x_(j). More precisely, the signal to hidden node h for pattern jis a weighted sum of inputs of the form

${{z_{h}(j)} = {\sum\limits_{i}{w_{ih}{X_{i}(j)}}}},$where w_(ih) represents the weight of the connector from input neuron ito hidden neuron h, and X_(j) (j) represents the (scaled) response ofthe i-th input neuron. The response of the hidden neuron h isY _(h)(j)=F _(h)(z _(h)(j)−b _(h))  (2.a)

Here, b_(h) is the bias of hidden neuron h, which from a computationalalgorithmic point of view is optimized just like any other weight of thenetwork. In the embodiment the nonlinear activation function F_(h) isthe hyperbolic tangent function.

Output Nodes

The output of output node o for patient j is to be determined. To thisend, in the embodiment a check is performed as to whether or not theoutput node o is still active. Connectors to output nodes may be presenteither from the hidden layer or from the input layer. For each connectorthat is still active, the appropriate input signals are multiplied bythe corresponding weights.

The signal z_(o) is first constructed: The bias of neuron b_(o) issubtracted out, and the activation function of the output neuron o isapplied to this result. The output O_(o) (j) thus becomes

$\begin{matrix}{{{z_{o}(j)} = {{\sum\limits_{i}{w_{io}\left( {{X_{i}(j)} - c_{i}} \right)}} + {\sum\limits_{h}{w_{ho}{Y_{h}(j)}}}}}{{O_{o}(j)} = {F_{o}\left( {{z_{o}(j)} - b_{o}} \right)}}} & \left( {2.b} \right)\end{matrix}$

The activation function of the output layer is taken as the identity inthe embodiment.

In the embodiment, the total bias does not vary freely, but rather, incontrast to the hidden layer, the total bias is constrained such thatthe median signal of all output neurons vanishes. This procedure doesnot restrict the generality of the model in any way. It has theadvantage of reducing the number of parameters to be optimized by thenumber of bias parameters.

Survival Analysis for Competing and Time-Varying Risks in the Context ofLearning Capable Models

Relationship to a Learning Capable System

Suppose that we are given a patient collective with available covariates(prognostic factors) x_(j), which were measured at an initial time,denoted t=0 (e.g., at the time of primary surgery), as well as endpointsin time denoted t_(j). One defines δ_(jk)=1 (k=1,2,3, . . . ) if a knownfailure of category k is recorded for the j-th patient at time t_(j). Ifthe patient is censored at the endpoint (no failure, further courseunknown) one defines δ_(jk)=0.

Let S_(k)(t) be the expectation value of the proportion of patientshaving suffered no failure of category k by time t, where S_(k)(∞)=0 andS_(k)(0)=1. For each k, it is useful to define a failure rate f_(k)(t)and a “hazard function” λ_(k)(t) by

$\begin{matrix}\begin{matrix}{{f_{k}(t)} \equiv {- \frac{\mathbb{d}S_{k}}{\mathbb{d}t}}} \\{{\lambda_{k}(t)} \equiv {- \frac{f_{k}(t)}{S_{k}(t)}}}\end{matrix} & \left( {3.a} \right)\end{matrix}$so that

$\begin{matrix}{{\lambda_{k}(t)} = {- {\frac{\mathbb{d}}{\mathbb{d}t}\left\lbrack {\log\mspace{14mu}{S_{k}(t)}} \right\rbrack}}} & \left( {3.b} \right)\end{matrix}$holds.

The interpretation of these individual hazard rates is as follows: If itwere possible to avoid failures of all other categories by a meanshaving no effect on the failure category k in question, then f_(k)(t)would approximate the observed failure rate for category k. Now, in areal situation, f_(k)(t) will not be observed as the failure rate.However, for use of the invention within a decision support system,failure rate estimates f_(k)(t) for all categories are needed in orderto determine the impact of a reduction of one failure rate on theoverall well-being of the patient.

For a known form of the hazard function λ_(k)(t), one obtains theS_(k)(t) by integration of Eq. (3.b) with the initial conditionS_(k)(0)=1.

At a time t after primary surgery for a patient with covariates x, weobtain from the neural net the hazard function λ_(k) (t|x), which nowdepends on covariates x. We express the hazard function model for givencovariates x in the formλ_(k)(t|x)=λ_(k0)(t)h _(k)(t|x)  (4)with

$\begin{matrix}{{h_{k}\left( {t❘x} \right)} = {\exp\left\lbrack {\sum\limits_{l = 1}^{L}{{B_{l}(t)}{{NN}_{kl}(x)}}} \right\rbrack}} & (5.)\end{matrix}$

The functions B_(i)(t) are chosen to be suitable for the particularproblem. One alternative is to use spline functions. In the embodiment,fractional polynomials, i.e., B_(i)(t)=t^((t−1)/2), are preferred forB_(i)(t).

One thus obtains

$\begin{matrix}{{\lambda_{0k}{\exp\left\lbrack {\sum\limits_{l}{{{NN}_{kl}(x)}{B_{l}(t)}}} \right\rbrack}} = {{- \frac{\mathbb{d}}{\mathbb{d}t}}{{\log\left( {S_{k}(t)} \right)}.}}} & (6.)\end{matrix}$

In this equation, the k are considered to be constant. The timedependence resides in the coefficients B_(i)(t). This model is aproportional hazards model if B₁=1 and all remaining B_(i) vanish.Deviations from proportional hazards can be modeled by including termsB_(i) with I>1.

In a broad class of applications, an objective function of the form

$\begin{matrix}{{L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}\;{P\left( {{{{\left\lbrack {f_{{NN}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack\left\lbrack {S_{{NN}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack}k} = 1},\ldots\;,K} \right)}}} & (7.)\end{matrix}$is optimized, where the notation indicates that P may depend (in some asyet unspecified manner) on the particular survival or failureprobabilities. This dependence is a feature of the particular problemand is determined according to a logical model for the occurrence of theparticular failure categories. A preferred class of objective functionsof the form (7.) may be regarded as statistical likelihood functions,where for the embodiment

$\begin{matrix}{{L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}{\prod\limits_{k = 1}^{K}\;{{\left\lbrack {f_{{NN}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack^{ɛ_{jk}}\left\lbrack {S_{{NN}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack}^{\psi_{jk}}.}}}} & (8.)\end{matrix}$is chosen. The two arguments f_(NN(k,x)) and S_(NN(k,x)) are determineduniquely under the assumption that the neural net or other learningcapable model provides the appropriate output node values. Thisassumption is always satisfied in the embodiment.

Here, ε_(jk) and ψ_(jk) are determined from δ_(jk) according to thedefined logical relationship of the failure categories where δ_(jk)=1 ifpatient j suffers failure of category k at time t_(j) and otherwiseδ_(jk)=0. Censored data patterns correspond to those patients for whichobservation ends before any failure is recorded, so that δ_(jk)=0 forall k=1,2,3, . . . . The functional dependence of the objective functionon the model is denoted symbolically by the variable parameters μ. Anexample for determination of ε_(jk) and ψ_(jk) is given in what follows.

In the embodiment, the parameters denoted μ are the baseline hazardconstants λ_(0k) and the weights of the neural network. The index jdenotes the particular patient data pattern.

In the embodiment, the time integration required to solve Equation 6 forS_(k) is computed by the standard method of Romberg integration. Thismethod allows arbitrary time dependence of the functions B_(i)(t) to betaken into account.

At the time t, let S(t) be the expectation value of the fraction ofpatients having experienced no failure of any of the categories k=1, . .. , K. In the embodiment, this quantity is given by the product of theindividual probabilities:

$\begin{matrix}{{S(t)} = {\prod\limits_{k = 1}^{K}\;{{S_{k}(t)}.}}} & (9.)\end{matrix}$Specification of the embodiment for an Example

For a complete specification of the embodiment, the quantities ψ_(jk)and ε_(jk) now need to be determined. In what follows, these functionsare completely specified as an illustration for two cases of theinvention embodiment that are typical for the application of theinvention in the context of competing risks.

Consider a disease with three failure categories. The patientfollowed-up is at month t (t=1, 2, . . . ). At month t, it can happenthat either some combination of the three failures or no failure at allis observed, in which case the patient is said to be “censored.” Thesituation is illustrated as a Venn diagram in FIG. 2.

In the case of the disease breast cancer, the three failure categoriescould be bone metastasis (B for “bone”, k=1), other distant metastasis,(D for “distant”, k=2), and loco-regional (L for “local”, k=3). At montht, occurrence of all three failure categories or any combination thereofis possible. However, for clinical, pharmacological, or data processingconsiderations, the follow-up at month t could be coded according to thefollowing logic:

bone metastasis (present/absent)?  If present, then ε_(j1) = 1 ε_(j2) =0 ε_(j3) = 0 Ψ_(j1) = 0 Ψ_(j2) = 0 Ψ_(j3) = 0  If absent, other distantmetastasis (present/absent)?   If present, then ε_(j1) = 0 ε_(j2) = 1ε_(j3) = 0 Ψ_(j1) = 1 Ψ_(j2) = 0 Ψ_(j3) = 0   If absent, thenloco-regional (present/absent)    If present, then ε_(j1) = 0 ε_(j2) = 0ε_(j3) = 1 Ψ_(j1) = 1 Ψ_(j2) = 1 Ψ_(j3) = 0    If absent, then ε_(j1) =0 ε_(j2) = 0 ε_(j3) = 0 Ψ_(j1) = 1 Ψ_(j2) = 1 Ψ_(j3) = 1In other words:

In this coding of ε_(jk) and ψ_(jk), the occurrence of bone metastasisis assigned highest priority, i.e., if bone metastasis is present, thenit is not recorded whether or not the other failure categories occurredby time t. Hence, according to this logic, for the observation “bonemetastasis present”, the contribution of patient j to the likelihoodfunction (8) is evidently given by the term f_(NN(1j)) (no term with S_(NN(kj)).)

If the observation is “bone metastasis absent, but other distantmetastasis present ”, then this coding implies a contributionf_(NN(2j))×S_(NN(1j)) to the likelihood function.

If the observation is “bone and other distant metastasis absent, butloco-regional metastasis present”, then this coding implies acontribution f_(NN(3j))×S_(NN(1j))×S_(NN(2j)).

If the observation is censored, the coding implies a contributionS_(NN(1j))×S_(NN(2j))×S_(NN(3j)) to the likelihood function.

The invention is also applicable to measurements in whichpresence/absence of each of a set of multiple failure categories at timet is always coded and taken into account, provided that the aboveequations are replaced by appropriate equations for the probability ofobserved combinations of failure types, given estimates of the separatefailure category probabilities.

Structure of a Neural Net for Determination of Competing Risks

FIG. 3 shows the structure of a neural net of architecture MLP. In thiscase, the neural net comprises

-   -   an input layer with a number N_(i) of input neurons    -   at least one internal or hidden layer with N_(h) h hidden        neurons    -   an output layer with a number N_(o) output neurons    -   a number of directed connectors each connecting two neurons of        different layers.

In the embodiment according to FIG. 3, a two-dimensional output layer isdepicted in order to illustrate the capability of the invention torepresent competing risks that are also time-varying. The representationis simplified for the special case of competing risks that are nottime-varying, i.e., only the dimension of the failure categories isrequired.

The number N_(i) of input neurons initially activated usuallycorresponds to the number of objective factors available for the patientcollective. Procedures for either reducing the number of input neuronsat the outset to a number acceptable for the computational resources orfor eliminating superfluous neurons during the course of optimizationare available according to the prior art, so that in either casedetermination of the neurons actually utilized is made automatically,i.e., without any intervention of the individual operating the system.

In the embodiment according to FIG. 3, the original number of hiddennodes is determined by the original number of input neurons, i.e.,N _(h) =N _(i)  (10.a)

In this case there exist procedures according to the, prior art enablinga favorable initialization of connector weights.

In the embodiment according to FIG. 3, the output layer neurons areorganized schematically in a two-dimensional matrix with indicesJ _(time)=1 . . . , N _(time)  (10.b)J _(key)=1, . . . , N _(key)  (10. c)where the number of originally active neurons of the output layer isgiven byN _(o) =N _(time) ×N _(key)  (10.d)

Here, the index J_(key) denotes the category of the signal, while theindex J_(time) refers to the signal corresponding to the “J_(time)-th”time function (e.g., fractional polynomials or spline functions).Accordingly, an output neuron indexed by the two indices J_(time),J_(key) contributes to the determination of the coefficient of the timefunction signal of index J_(time) for the risk of category J_(key). Inthe embodiment, the indices J_(key) and J_(time) correspondschematically to the indices k and l, respectively, of Equations 4 to 7.The quantities N_(key) and N_(time) of the embodiment correspondanalogously to the quantities K and L, respectively, of these equations.

For application of the invention to the case of recursive partitioning,note that are also end nodes (also known as “leaves” of the regression“tree”), which usually (i.e., for only one risk) are numbered as aone-dimensional sequence. According to the prior art, each patient isassigned to one such node. According to the prior art, a nodecorresponds to a risk that may be regarded as a (scalar) signal. Incontrast, instead of a scalar, the invention assigns to each end node avector with N_(key) indices.

Training

For the embodiment, the purpose of learning (training) is to locate theposition in parameter space with a value of the likelihood function thatis as high as possible while avoiding superfluous parameters ifpossible. In the embodiment, training is performed by initialization,optimization steps, and complexity reduction as follows:

Initialization

Univariate Analysis

Before the entire network with all weights is trained, it isadvantageous to carry out a univariate analysis for each factor. Thisanalysis has several applications:

-   -   The univariate impact of the factors on a risk k or, put another        way, their individual prognostic performance is available as a        reference for comparison with the complete network.    -   Univariate analysis is of practical use in determining a ranking        of factors for the case in which there are fewer input nodes        than factors.

Univariate analysis provides a basis for initialization of weightsfavoring, or at least not suppressing, nonlinear configurations (seebelow).

In the embodiment, an exponential survival model is constructed with thesingle parameter consisting of the baseline hazard constant λ₀. Thismodel is used for initialization and also serves as a reference in thesubsequent analysis.

Linear Univariate Models

The value of the j-th factor X_(j) transformed according to Eq. (1a) isregarded as the single input node in a “network” consisting of exactlyone linear connector (i.e., no hidden nodes) from this input neuron toan output neuron k. The time variation of this output node correspondsto the “proportional hazards model” for censored data. The resultingmodel has only two free parameters: the baseline hazard constant (λ₀)and the weight associated with the connector. These are optimized forrisk k and their values stored in a table for subsequent reference,together with the performance (likelihood) and the statisticalsignificance.

Nonlinear Univariate Models

Next, for each factor, a four-parameter nonlinear univariate model isoptimized. Here, the value X_(j) resulting from the transformation ofthe j-th factor is considered as the “input neuron.” The univariatenetwork now consists of this one input neuron, one single hidden neuron,and one output neuron (without a linear connector between input andoutput neuron). The time-variation of this output node corresponds to a“proportional hazards model” (K=1) for censored data.

The four parameters correspond respectively to the baseline hazardconstant (λ₀), the weight and bias to the hidden neuron; and the weightof the connector to the output layer. These values are optimized andstored in a table for subsequent use together with the performance(likelihood) and significance.

Input Variable Ranking

After the univariate models have been determined for each factor, thefactors significant in univariate analysis are ranked according to theabsolute values of their linear weights. The numbering of input nodesfor the subsequent full analysis corresponds to this ranking. If fewerinput nodes than factors are available, this procedure allows anobjective pre-selection of the “most important” factors.

Initialization of Weights

For net optimization (training), it is necessary to set initial valuesof the weights. Setting weights to exactly zero is unsatisfactory. Inthe embodiment, the weights of the linear connectors are initialized torandom small values in the usual way. The baseline hazard constant isinitialized to the value λ₀ determined from the 1-parameter model. Thenumber H of hidden nodes is taken equal to the number J of input nodes.The connector from the input neuron j to the hidden neuron with the sameindex h=j is now initialized to the weight determined from the“nonlinear univariate model” described above. The associated bias isinitialized analogously to the corresponding bias of the nonlinearunivariate model. These two quantities are then shifted by a smallrandom amount. Hence, by construction, the output of each hidden nodecorresponds approximately to an optimized nonlinear value.

For each hidden node h, the value of the weight obtained by theaforementioned univariate optimization, denoted here as w_(h1), to thefirst neuron of the output layer is also available. Now, in order toinitialize the weights to the output layer, the quantities w_(h1), h=1,. . . , H are weighted by H random numbers: In the embodiment a randompartition of unity is generated by first sampling H random numbers froma uniform distribution [0,1] and then dividing by their sum; i.e., theresulting numbers sum to 1. These and all other connectors (i.e.,weights from the hidden layer to neurons of the output layer with k=2,etc.) are shifted by a small random amount.

An alternative procedure that commonly used in the prior art forinitialization of neural net training consists of assigning small randomweights to all connectors. This procedure results in an initialconfiguration in which all connectors, including those leading into thehidden layer, ar in the linear regim i.e., for small arguments, the“activation function” is nearly linear; for example tanh(x)≈x for smallvalues of x.

Linear Statistics of the Input Factors

In the embodiment, the covariance matrix of all input factors iscomputed and saved; a linear regression of each factor on all theothers—i.e., X₂≈A X₁+B—is also computed and saved; Eigenvectors andeigenvalues of the covariance matrix are also computed and saved; allthese computations are written to a protocol. Moreover these linearrelationships are used for various pruning procedures in the embodiment.

Assignment of Patient Data to Training and Validation Sets

For a learning capable system, it is common to split the set ofavailable patterns by random selection into training, validation, andgeneralization sets. In the embodiment, the user can specify percentages(including zero) of the entire pattern set to be reserved for validationand generalization, respectively. The generalization set is not takeninto account for training at all, in order to enable a completelyunbiased subsequent test of performance on these patterns. Theperformance on the validation set, if present is tested repeatedly inthe course of optimization: The performance on the validation setprovides an independent measure of the progress of optimization, whichis based otherwise on the training set performance alone, and testingthis additionally serves to avoid over-training.

Selection of Factors

In the embodiment, there is an option to restrict consideration to apre-specified subset of factors; for example in order to obtain modelsfor applicable to future patterns in which only this factor subset isavailable.

Net Optimization

Simplex Optimization

Optimization involves a search for a maximum of the likelihood functionwith respect to the data of the training set. The parameter space forthe search consists of the n-K net weights that are still activetogether with the global baseline hazard constants λ_(k0), k=1, . . . ,K. This requirement results in an n-dimensional search space.

The search method implemented in the embodiment utilizes theconstruction of an n-dimensional simplex in this space according to themethod of Nelder and Mead (1965), known from the prior art. The searchrequires the construction of an n-dimensional simplex in parameterspace. A simplex is uniquely determined by specification of n+1non-degenerate vertices, i.e., the corresponding edges are all mutuallylinearly independent. A simplex thus bounds an n-dimensional point-setin parameter space. The optimization search is conducted in iterationsteps known as “epochs”. During each epoch, the performance on thetraining set is computed by evaluation of the objective function atseveral “locations” in parameter space, that is, at the currentreference vertex position land at n additional vertices, which aredetermined by composition of mathematical operations such as reflection,expansion/contraction in a direction, etc. The directions in parameterspace associated with these operations are automatically determinedbased on the characteristic performance value on the vertices of thepreceding epoch, and a new reference vertex is determined. In theembodiment, the performance at the reference vertex is a monotonicfunction (up to machine accuracy), and the search terminates at a pointthat is at least a local minimum (i.e., of the negative of the functionto be maximized).

Utilization of the Validation Set

If present, the aforementioned validation set serves as a check of theprogress of optimization and avoidance of over-training.

In the embodiment, the negative log-likelihood per pattern on thetraining and validation sets, respectively, are continually computed andarchived as characteristic measures of the performance on these two setsat the current optimization epoch. Although this characteristicdecreases monotonically on the training set as a consequence of thesimplex method, temporary fluctuations of the correspondingcharacteristic can occur on the validation set even if over-training hasnot yet occurred. However, if steady increase of the characteristic onthe validation set occurs, it is advantageous to trigger the stopping offurther optimization (training) followed by a round of complexityreduction. This form of stopping criterion represents a kind of“emergency brake” for avoidance of over-training.

The embodiment provides for an automatic stopping criterion by definingand monitoring at each epoch an exponentially smoothed performancecharacteristic on the validation set. If this smoothed characteristicexceeds the previously attained minimum (i.e. if the performanceworsens) by a pre-specified percentage, the optimization isautomatically stopped. Tolerance of a percentage increase of 1% has beendetermined for a typical size of the training set of about 300 or moredata patterns. For this tolerance, assuming that training and validationsets are about the same size, the stopping condition for training ismore often triggered by attainment of an absolute minimum on thetraining set than by the worsening of the performance on the validationset. This “normal” stopping criterion is preferred because an (almost)monotonic improvement of performance on the validation set is anindicator that the neural network has recognized true underlyingstructures, rather than merely random noise.

No validation set is used in the example of the embodiment. In thiscase, the slopping criterion is just the attainment of a minimum on thetraining set.

Structure Optimization and Complexity Reduction

The result of the simplex optimization described for the embodiment is aset of weights {w_([1]), . . . , w_([n])} and other parametersdetermining a local minimum of the negative log likelihood. (Thenumbering [1]. . . [n] of the weights need not corresponds to theirtopological ordering.) This minimum refers to the particular set of nfixed weights in their particular topology. Now, in order to avoidover-fitting, it is desirable to reduce the complexity by pruningweights, as long as this pruning does not result in a significant lossof performance.

Pruning denotes the deactivation of connectors. To this end, the weightsof said deactivated connectors are “frozen” to a fixed value (in theembodiment, the fixed value is zero, so that one may also speak of“removing” weights). It is possible in principle to remove individualweights or even entire nodes. In the latter case, all weights leading toor from the node to be pruned are deactivated.

In the embodiment, a phase of complexity reduction is carried out in thenetwork immediately following an optimization phase (simplex procedure).The first step of this complexity reduction phase is “pruning” ofindividual connectors. Next, combinations of different connectors aretested for redundancy. Finally, the consistency of the topology ischecked, and those connectors and/or nodes are removed that, due toprior removal of other connectors and nodes, no longer contribute to theoutput. This procedure is not the subject of the invention, butrepresents good practice according to the state of the art.

In the embodiment, various statistical hypotheses are automaticallyconstructed for complexity reduction, which are tested by means of alikelihood ratio test with respect to a pre-specified significancelevel. Certain weights and parameters are considered to be mandatory,i.e., they are not subject to removal. In the embodiment, these includethe global baseline hazard constants λ_(0k).

Connector Ranking

In order to determine the order in which to test the connectors, a teststatistic log(likelihood ratio) is constructed in the embodiment. Here,for each weight w_([A]), one considers two networks:

-   -   The net with all current weights (n degrees of freedom),        including w_([A]).    -   The net with all current weights except for w_([A]), which is        deactivated (n-1 degrees of freedom).

In the net with W_([A]) deactivated, the remaining weights areconsidered to be fixed at their current optimized values.

Testing

In the embodiment, after a ranking {w_([1]), . . . w_([n])} of the,weights according to the “likelihood ratio”has been recorded, theweights are tested in this order for pruning, until a specified maximumof G_(max) weights have been chosen for removal. Denoting by A-1 thenumber of weights already removed, two hypotheses are tested todetermine whether an additional A-th weight w_([A]) is to be removed.

-   -   Test statistic for the hypothesis H_(A-1): Likelihood ratio for        net with weights {w_([1]). . . w_([A-1])} deactivated (n-A+1        degrees of freedom)    -   Test statistic for the hypothesis H_(A): Likelihood ratio for        net with weights {w_([1]). . . w_([A])} deactivated (n-A degrees        of freedom)

The hypothesis HA is now tested twice:

-   -   H_(A) versus H_(A-1) and    -   H_(A) versus H.

The significance of w_([A]) is tested by application of a chi-squaredtest with respect to the likelihood ratio. If H_(A) is accepted ineither of the comparisons (pruning A leads to significantly worse fit),then the connector A is retained, and the pruning step is terminated.

In deactivation, the connector is removed from the list of activeconnectors and its corresponding weight is frozen (usually to zero).

In the embodiment, the number G of connectors removed during a pruningphase is limited to a maximum of G_(max)=n/10, where n is the number ofremaining connectors.

Further Complexity Reduction

In the embodiment, further connectors are removed by pairwise analysisof weights and their relationship to the likelihood of the data, takinginto account various correlation properties. However, this step is by nomeans compulsory for the function of a learning capable model and can beomitted. Alternative embodiments of the invention can be combined withalternative or additional techniques of complexity reduction that may bealready implemented in various learning capable systems.

Topology Check

Pruning or removal of individual connectors can result in isolation of anode either from all input signals, all output signals, or (in the caseof a hidden neuron) from both. In any of these cases a deactivation flagis set in the embodiment for the node in question. For output layerneurons, “isolation” means that there are no active connectors into thenode: neither from the input layer, nor from the hidden layer. If allconnectors from an input neuron to the hidden and output layers havebeen removed, then the bias of the linear connectors is alsodeactivated.

A hidden neuron that has been isolated from all inputs can still beconnected to outputs. However, the “frozen” contribution of such hiddenneurons to the output are redundant because there only effect is tomodify the bias values of the remaining active connectors. Hence, suchneurons are deactivated, and any remaining connectors to the outputlayer are removed.

These various checks can themselves lead to isolation of further nodes.For this reason, the procedure is iterated until the topology remainsconstant.

Termination of Training and Output

In the embodiment of the invention, if no further complexity reductionis possible following the last simplex optimization, training isterminated. The final values of all weights and other parameters are setto their final values, and these values are archived in files createdfor this purpose.

Thus, the trained neural network is uniquely determined. By reading inthese archived values of weights and other parameters (eitherimmediately or at any later time), the trained neural net can be usedaccording to the above description to reconstruct, for arbitrary datacontaining values of the independent variables (“covariates”) x, theoutput scores and thus the previously defined functions f_(k)(f), λ_(k)(f), and λ_(k)(f), associated with these covariates x. With thesefunctions, the probability model is determined.

In particular, it is of course possible to compute the, dependence ofthe form of said functions on the values of selected factors. Acomputation of this dependence is useful in order to evaluate theexpected effect of a therapy concept, if the therapies to be evaluatedwere used as “factors” in training the learning capable system.

EXAMPLE

Data

In order to illustrate the operation of the invention in the embodiment,1000 synthetic patient data patterns were first generated containing 9explanatory factors (covariates) by means of a random sample generator.The first seven of these factors were sampled as realizations of amultivariate normal distribution. The means and variances for theexample were specified thus:

Faktor xlypo xer xpr xage xtum xupa xpai Mittelwert 0.50 0.45 0.45 0.500.51 0.50 0.50 Varianz 0.071 0.087 0.097 0.083 0.083 0.084 0.083

The assumed covariance matrix was

xlypo xer xpr xage xtum xupa xpai xlypo 1.00 −0.06 −0.09 0.03 0.42 0.020.05 xer −0.06 1.00 0.54 0.29 −0.07 −0.18 −0.19 xpr −0.09 0.54 1.00 0.03−0.06 −0.07 −0.14 xage 0.03 0.29 0.03 1.00 0.04 0.02 0.00 xtum 0.42−0.07 −0.06 0.04 1.00 0.03 0.06 xupa 0.02 −0.18 −0.07 0.02 0.03 1.000.54 xpai 0.05 −0.19 −0.14 0.00 0.06 0.54 1.00

In order to represent as realistic a situation as possible, these valueswere chosen to be of the same order of magnitude as values known fromthe scientific literature for certain factors used in the case of breastcancer. However, for the function of the invention, the precise valuesassumed as well as the interpretation of the factors are completelyimmaterial.

In addition to the seven aforementioned factors, two further binaryfactors (“therapies”) denoted “ct” and “ht” were randomly generated. Forht, 50% of the patients were randomly assigned value 1 and 0,respectively. In the example, only 1% of the patients were assignedct=1, the rest zero. Hence, it is to be expected that ct would not bedetected as a significant factor by the neural net.

The first ten resulting patterns are as illustrated:

Patient Number xlypo xer xpr xage xtum xupa xpai ct ht 1 0.07 0.89 1.410.36 0.49 0.31 0.22 0 1 2 0.25 0.23 0.98 0.15 0.10 0.31 0.05 0 0 3 0.560.52 0.79 0.09 0.22 −0.22 −0.07 0 1 4 0.61 0.83 1.10 0.73 0.56 0.21 0.440 1 5 0.97 0.38 0.70 0.61 0.51 0.97 0.72 0 0 6 0.44 0.22 0.07 0.90 0.800.60 0.55 0 1 7 0.46 0.24 0.47 0.14 0.60 0.57 0.31 0 0 8 0.42 0.60 0.410.38 0.54 0.23 0.47 0 0 9 −0.01 0.22 0.80 0.52 0.38 −0.13 0.41 0 0 100.80 0.41 0.19 0.11 0.45 0.40 0.51 0 0

For the influence of the factors on disease course, three independentrisk hazards denoted risk(i), i=1,3 were first generated. The followingmodel was assumed:

-   risk(1)=exp(r₁+r₂+r₃+r₄−r_(h)) risk(2)=exp(r₁+r₃+r₄) risk(3)=exp(r₁)    with-   r₁=2(xlypo-median(xlypo))-   r₂=0,5(xtum-median(xtum))-   r₃=0,75(xupa-median(xupa))-   r₄=1,5(xpal-median(paimed)) and-   r_(h)=1 if ht=1.

Using these risk values, true failure times of the three risk categorieswere generated by random sampling from exponential distributions ormodified exponential distributions with a base time constant of 200months. It was additionally assumed that if failures of the 3^(rd)category do occur, then at the latest by the 24^(th) month, in order tosimulate a situation with competing risks analogous to loco-regionalrelapse in breast cancer. These data were censored according to asimulated “study”, and an “observation” was simulated according thepriority scheme of FIG. 1.

If follows from the model assumed in the example that for the thirdfailure category, only the factor “xlypo” has a causal effectNonetheless, there is an indirect relationship between the remainingfactors and the observation of failures of the third failure category,because an increased risk of the other failure categories resulting fromother factors can reduce the probability of observing a failure of thethird category. Although this characteristic of the assumed model isimmaterial for the function of the invention, it illustrates a potentialbenefit.

Trained Neural Net

The neurons of the output layer are arranged according to Equations 4 to7 and 10 with N_(time)=1 and N_(key)=3, so that 3 neurons of the outputlayer are initially active. For the example, 9 input neurons and anequal number of hidden neurons are initially activated. The neural nettrained according to the methods described above is illustrated in FIG.3 (“xpai” and “xpai1” are identical). Note that only one connector leadsto the output node “O3”, which originates from the node (neuron)“xlypo”. Here, the outputs O1 to O3 correspond to the risks “risk(1)” to“risk(3)”, respectively.

A complete and unique representation of the trained neural net isdetermined by specifying the remaining connectors with theircorresponding weights and biases as well as the baseline hazardconstants. To demonstrate this, Table 2a lists each neuron that receivesan active connector (target neuron, “tgt”) and all sources (“src”) withtheir corresponding weights (“wt”). Note that many of the originalconnectors are inactive.

TABLE 2a tgt src wt src wt src wt src wt src wt src wt src wt src wt srcwt h1 ht 13.5 h6 xlypo 0.53 xupa −1.78 xtum 1.02 h7 xer 1.98 xpr −1.37h8 xage 1.70 h9 xpr 2.31 o1 h1 −1.70 h6 0.30 ht −1.10 xlypo 0.19 xpai0.72 xupa 0.83 xtum 0.22 o2 h1 2.03 h6 −0.66 h7 −0.86 h8 0.33 h9 −0.64xlpo 0.64 xpai1 0.91 xer 0.56 xage −0.42 o3 xlypo 2.39

The biases are given in Table 2b:

TABLE 2b Bias values (automatically 0 for inactive neurons) ht xlypoxpai xupa xtum ct xer xage xpr h1 h2 h3 h4 h5 h6 h7 h8 h9 o1 o2 o3 0.170.16 0 0 0 0 0 0 0 −0.94 0 0 0 0 0.66 1.31 0 2.07 1.03 0.66 −0.11

Finally, the values of the baseline hazard constants λ_(0k) required forspecification of the model of Equation 6 may be read off from Table 2c(the units of these constants correspond to the aforementioned timeconstant of 200 months):

TABLE 2c λ₀₁ λ₀₂ λ₀₃ 0.53/200 0.13/200 0.27/200Time-Varying Hazards

Output neurons for time-varying hazards could be included by replacingthe parameter N_(time)=1 as used here by a higher value of N_(time). Thenumber of output neurons is then determined by Equation 10.d. Forexample, if N_(key)=3 and N_(time)=2, one would then have N₀=6. Trainingwould proceed as described previously. If present, the separatetime-varying hazards associated with the different risk categories couldthen be determined independently using the model of Equations 4 to 7,and in particular, the problem of determining competing risks would inno way be restricted.

1. Method of determining a therapy decision regarding a medical patientbased on a likelihood of outcomes considering potential post-therapyconditions, comprising: determining an objective function of a trainedneural network based on patient events and observations for a generalclass of medical condition, wherein the previously-measured patientevents and observations are objectifiable, and wherein the neuralnetwork includes an input layer having a plurality of input neurons thatreceive input data, at least one internal layer having a plurality ofhidden neurons, an output layer having a plurality of output neuronsthat provide output signals, and a plurality of connectors, wherein eachsaid connector interconnects two neurons of different layers, defining asending direction from the input layer toward the output layer,recognizing an occurrence of a particular medical condition belonging tothe general class of medical condition, as exhibited by the medicalpatient; providing an input to the trained neural network correspondingto the particular medical condition; combining a plurality of valuesgenerated by the trained neural network according to the objectivefunction to produce one or more outputs; determining underlyingprobabilities of known potential post-therapy conditions of theparticular medical condition based on the one or more outputs of thetrained neural network; and determining the therapy decision for themedical patient based on the determined underlying probabilities;wherein the objective function L is expressed in terms of a function P:${L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}{P\left( {{{{\left\lbrack {f_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack;}\left\lbrack {S_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack};{k = 1}},\ldots\;,K} \right)}}$where μ denotes the parameters of the trained neural network,f_(LS(k,xj))(t_(j)) is a failure rate of category k, S_(LS(k,xj))(t_(j))is an expected proportion of medical patients j with observed factorsx_(j) not having experienced a failure of category k by time t_(j), andP is determined from δ_(jk) by a logical relationship, wherein δ_(jk)=1if medical patient j has experienced failure of category k at time t_(j)and otherwise δ_(jk)=0; and wherein${L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}{\prod\limits_{k = 1}^{K}\;{\left\lbrack {f_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack^{ɛ_{jk}}\left\lbrack {S_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack}^{\psi_{jk}}}}$is the objective function, where ε_(Jk) and ψ_(jk) are determined fromδ_(jk) on the basis of logical relationships.
 2. Method according toclaim 1, further comprising applying therapy to the medical patientaccording to the therapy decision.
 3. Method according to claim 1,further comprising using objectively-obtained data pertaining to theparticular medical condition and to follow-up observation of the medicalpatient during a predetermined time period to modify the patient eventsand observations.
 4. Method according to claim 3, further comprisingapplying the modified patient events and observations to the objectivefunction.
 5. Method according to claim 3, further comprising using atime of a last follow-up observation to modify the patient events andobservations.
 6. Method according to claim 1, further comprising basingthe patient events and observations at least in part on observations offailure categories, wherein the failure categories are observedindividually while other categories of events are excluded.
 7. Methodaccording to claim 1, wherein${L\left( {\mu;\left\{ {x_{j},t_{j},\delta_{jk}} \right\}} \right)} = {\prod\limits_{j = 1}^{n}{\prod\limits_{k = 1}^{K}\;{\left\lbrack {f_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack^{\delta_{jk}}\left\lbrack {S_{{LS}{({k,x_{j}})}}\left( t_{j} \right)} \right\rbrack}^{1 - \delta_{jk}}}}$is the objective function.
 8. Method according to claim 1, furthercomprising carrying out recursive partitioning for a plurality medicalpatients by the neural network, including assigning each medical patientof the plurality of medical patients to a respective node of the neuralnetwork, assigning at least one of a frequency and a probability of eachfailure category of a set of predetermined failure categories to eachsaid node, and statistically optimizing the objective function based atleast in part on the at least one of a frequency and a probability. 9.Method according to claim 1, wherein the neural network is used in theframework of a decision support system.
 10. Method according to claim 1,wherein values for the determination of the therapy decision are put incorrespondence with different probability functions of the potentialpost-therapy conditions.
 11. Method according to claim 1, wherein thepatient events and observations are previously-measured.